Skip to content

Conversation

@akshayutture-augment
Copy link
Contributor

* remove the use of client side cache for in-proc authz client

Co-authored-by: Gabriel MABILLE <[email protected]>

* add a permission denial cache, fetch perms if not in either of the caches

Co-authored-by: Gabriel MABILLE <[email protected]>

* Clean up tests

Co-authored-by: Ieva <[email protected]>

* Cache tests

Co-authored-by: Ieva <[email protected]>

* Add test to list + cache

Co-authored-by: Ieva <[email protected]>

* Add outdated cache test

Co-authored-by: Ieva <[email protected]>

* Re-organize metrics

Co-authored-by: Ieva <[email protected]>

---------

Co-authored-by: Gabriel MABILLE <[email protected]>
@akshayutture-augment
Copy link
Contributor Author

@claude review this PR

1 similar comment
@akshayutture-augment
Copy link
Contributor Author

@claude review this PR

@claude
Copy link

claude bot commented Nov 20, 2025

Claude finished @akshayutture-augment's task —— View job


PR Review Complete

PR: AuthZService: improve authz caching

I've reviewed the changes in this PR which improves caching for the authorization service. Here's my comprehensive review:


Summary of Changes

This PR implements two key caching improvements:

  1. Removes client-side caching for in-process authz clients (using NoopCache)
  2. Adds a permission denial cache to avoid redundant database lookups for denied permissions
  3. Refactors cache checking logic to check cache first, then fall back to database

Positive Aspects ✅

  1. Smart denial caching: The new permDenialCache (lines 116-120 in service.go) prevents repeated database queries for permissions that were already denied. This is a clever optimization for negative results.

  2. Excellent test coverage: The PR adds comprehensive test cases for:

    • Cache hit scenarios
    • Cache miss/fallback scenarios
    • Outdated cache scenarios
    • Explicit denial cache behavior
  3. Proper cache key specificity: The userPermDenialCacheKey includes all relevant parameters (namespace, userUID, action, name, parent) to ensure cache correctness (cache.go:30-32).

  4. Backward-compatible refactoring: The logic reorganization maintains the same external behavior while improving performance.


Issues & Concerns ⚠️

1. Potential Cache Inconsistency Issue (High Priority)

Location: service.go:116-121 and service.go:153-156

The denial cache entry is set when allowed == false, but this creates a potential issue:

cachedPerms, err := s.getCachedIdentityPermissions(ctx, checkReq.Namespace, checkReq.IdentityType, checkReq.UserUID, checkReq.Action)
if err == nil {
    allowed, err := s.checkPermission(ctx, cachedPerms, checkReq)
    // If cached perms exist but permission is denied, we return early WITHOUT setting denial cache
    if allowed {
        return &authzv1.CheckResponse{Allowed: allowed}, nil
    }
}
// Denial cache only set here (line 153-156)
if !allowed {
    s.permDenialCache.Set(ctx, permDenialKey, true)
}

Problem: If cached permissions exist but don't grant access, the code returns early at line 122-135 without setting the denial cache. The denial cache is only set at lines 153-156 when fetching fresh permissions from the database. This means:

  • First check with cached perms → denied, no denial cache set
  • Second identical check → fetches from cache again, checks perms again (not optimized)

Recommendation: Set the denial cache in both paths when access is denied:

if err == nil {
    allowed, err := s.checkPermission(ctx, cachedPerms, checkReq)
    if err != nil { ... }
    if allowed {
        // Cache hit and allowed
        return &authzv1.CheckResponse{Allowed: allowed}, nil
    } else {
        // Cache hit but denied - set denial cache here too
        s.permDenialCache.Set(ctx, permDenialKey, true)
        return &authzv1.CheckResponse{Allowed: false}, nil
    }
}

2. Missing Cache Invalidation Strategy (Medium Priority)

Location: Throughout caching implementation

The PR doesn't address cache invalidation when:

  • User permissions are updated
  • Team memberships change
  • Roles are modified
  • Folders are moved/deleted

The denial cache with a 30-second TTL (shortCacheTTL) mitigates this somewhat, but users could experience up to 30 seconds of stale permissions after changes.

Recommendation: Document the cache invalidation strategy or consider implementing cache invalidation hooks for permission changes.

3. NoopCache for In-Process Client (Low Priority)

Location: rbac.go:101-105

The PR changes the in-process RBAC client to use NoopCache instead of the local cache:

rbacClient := authzlib.NewClient(
    channel,
    authzlib.WithCacheClientOption(&NoopCache{}),  // Was: local cache
    authzlib.WithTracerClientOption(tracer),
)

Question: Why disable client-side caching for in-process clients? The commit message says "remove the use of client side cache for in-proc authz client" but doesn't explain the rationale. Is this because:

  • The server already has caching, making client cache redundant?
  • Double caching was causing issues?
  • Memory concerns?

Recommendation: Add a code comment explaining why NoopCache is used for in-process clients.

4. Inconsistent Early Return in Check() Method (Low Priority)

Location: service.go:116-135

The early cache check logic is slightly inconsistent with error handling:

  • Denial cache check (lines 116-120): Returns immediately on cache hit
  • Permission cache check (lines 123-135): Returns immediately only if allowed == true

When cachedPerms exist but deny access, the code continues to fetch fresh permissions from the database. This seems intentional (to handle outdated cache), but it means cached denials are treated differently than explicit denial cache entries.

Recommendation: Clarify the intended behavior - should cached denials also be trusted, or is the current "fetch fresh on cache denial" behavior intentional as a safety mechanism?


Minor Issues 🔍

5. Test Removal Without Clear Justification

Location: service_test.go:342-350

A test case was removed:

// REMOVED:
{
    name: "should return permissions from cache if available",
    permissions: []accesscontrol.Permission{
        {Action: "dashboards:read", Scope: "dashboards:uid:some_dashboard"},
    },
    cacheHit:      true,
    expectedPerms: map[string]bool{"dashboards:uid:some_dashboard": true},
},

This test verified cache behavior in getUserPermissions(). After refactoring, the cache check moved to getCachedIdentityPermissions(), which is tested elsewhere. However, removing this test slightly reduces coverage of the full flow.

Recommendation: Consider keeping a similar test or document why it's redundant.

6. Code Cleanup Artifact

Location: rbac.go:160

The function newRBACClient() was removed and its logic inlined, leaving this slightly awkward code structure:

func newRemoteRBACClient(clientCfg *authzClientSettings, tracer trace.Tracer) (authlib.AccessClient, error) {
    // ... connection setup ...
    
    client := authzlib.NewClient(...)  // created but immediately returned
    return client, nil
}

Recommendation: Minor - could directly return the result of authzlib.NewClient(), but current code is fine.


Performance Considerations 📊

Positive Impact:

  • Denial cache should significantly reduce database load for unauthorized access attempts
  • Removing redundant client-side cache for in-process clients reduces memory overhead

Potential Concerns:

  • Cache checking overhead is minimal (map lookups)
  • The additional denial cache increases memory usage slightly, but with 30s TTL this should be negligible

Code Quality 📝

Strengths:

  • Clean separation of concerns with getCachedIdentityPermissions()
  • Good use of metrics for observability
  • Comprehensive test coverage of caching behavior

Areas for Improvement:

  • Missing code comments explaining the caching strategy
  • No documentation of cache TTL choices (why 30s vs 2min?)

Recommendations Summary

Must Address:

  1. ⚠️ Fix the cache inconsistency issue where denial cache isn't set when using cached permissions (Unified Storage Performance Optimizations #1 above)

Should Consider:
2. Document cache invalidation strategy
3. Add comment explaining NoopCache usage for in-process clients
4. Clarify the intended behavior for cached permission denials

Nice to Have:
5. Add inline comments explaining the caching strategy
6. Document why specific TTL values were chosen


Verdict

This PR makes solid improvements to caching performance, particularly with the denial cache optimization. The test coverage is excellent. However, there's a potential cache inconsistency issue (point #1) that should be addressed before merging to ensure the denial cache works as intended in all scenarios.

Recommendation: Request changes to address the cache inconsistency issue, then approve.


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants